5-Year Impact Factor: 0.9
Volume 35, 12 Issues, 2025
  Letter to the Editor     November 2025  

Reducing Unnecessary Free Thyroid Hormone Testing

By Xuan Zhang, Guo-Ming Zhang

Affiliations

  1. Department of Laboratory Medicine, Shuyang Hospital, the Affiliated Shuyang Hospital of Xuzhou Medical University, Shuyang, China
doi: 10.29271/jcpsp.2025.11.1491

Sir,

We had the honour of reading the paper by Murthy et al., Reducing Unnecessary Free Thyroid Hormone Testing by the Reinforcement of a Reflexive Algorithm in an Outpatient Environment, published in Clinical Biochemistry.1 This type of graded testing is worth recommending, given the rising medical expenses. This strategy, which involves reflexively testing free T4 (FT4) and free T3 (FT3) when TSH values ​​are below the lower limit of the reference interval (LLR), and testing FT4 when TSH values ​​are above the upper limit of the reference interval (ULR), is similar to that used in our previous studies.2 However, we would like to raise a question supported by our own data. Is it appropriate to use the ULR and LLR of TSH for the reflex detection of FT4 or FT3? The LLR and ULR of TSH are the 2.5th and 97.5th percentiles of TSH levels, respectively, in a healthy population. Therefore, the LLR and ULR of TSH may not represent optimal thresholds  for  the  reflexive  detection  of  FT4  or  FT3.3-5

We collected all outpatient orders for thyroid function tests (TSH, FT3, and FT4) from the laboratory information system (LIS) of the Department of Laboratory Medicine, Shuyang Hospital, the Affiliated Shuyang Hospital of Xuzhou Medical University, Shuyang, China, from December 2019 to 2024. In this laboratory, the reference interval of TSH is 0.34-5.60 mIU/L, the reference interval of FT4 is 7.86-14.41 pmol/L, and the reference interval of FT3 is 3.80-6.00 pmol/L. The thyroid function tests were performed via the DXI800 immunoassay system (Beckman Coulter). These thyroid function tests are accredited by the ISO 15189 standard for medical laboratories. We collected data from 42,020 pairs of thyroid function tests. We also adopted the strategy used by Murthy et al.The algorithm aims to reflexively detect FT4 and FT3 when TSH values ​​are low or high, since not all FT3 results are normal even when FT4 is normal. However, we determined the optimal thresholds by using the receiver operating characteristic (ROC) curve, rather than artificially setting ULR and LLR intervals as the optimal thresholds. All statistical analyses were performed via MedCalc software. We defined abnormal FT3 or FT4 levels as abnormal  thyroid  function.

Table I: ROC curve of TSH for abnormal free thyroid hormone levels and the number of abnormal thyroid hormones.
 

Groups

Hypothyroidism (FT3 below LLR or/and FT4 below LLR)

Hyperthyroidism (FT3 above ULR or/and FT4 above ULR)

Parameters

ROC

ULR

ROC

LLR

n

42020

Number of positive cases with normal TSH levels

1822

4147

Number of positive cases

3718

9010

Optimal thresholds

5.03 mIU/L

5.60 mIU/L

0.72 mIU/L

0.34 mIU/L

Number of positive cases with TSH below or above

Optimal thresholds

1701

1629

5280

4587

AUC

0.721 (95% CI: 0.717-0.725)

N/A

0.779 (95% CI: 0.775-0.783)

N/A

Youden index

0.3619

0.3590

0.4563

0.4390

Sensitivity (%)

45.8 (95% CI: 44.2-47.4)

43.8 (95% CI: 42.2-45.4)

58.7 (95% CI: 57.7-59.7)

51.1 (95% CI: 50.1-52.1)

Specificity (%)

90.4 (95% CI: 90.1-90.7)

92.1 (95% CI: 91.8-92.4)

86.9 (95% CI: 86.5-87.3)

92.8 (95% CI: 92.5-93.1)

Positive predictive value (%)

31.7 (95% CI: 30.7-32.7)

35.0 (95% CI: 33.9-36.2)

55.1 (95% CI: 54.2-55.9)

65.9 (95% CI: 64.9-66.9)

Negative predictive value (%)

94.5 (95% CI: 94.3-94.7)

94.4 (95% CI: 94.3-94.6)

88.5 (95% CI: 88.3-88.8)

87.4 (95% CI: 87.2-87.7)

LLR: Lower limit of reference interval; ULR: Upper limit of reference interval.
 

Figure 1: The Concordance correlation coefficient between TSH and FT3 or FT4.

The results of our study revealed that the optimal threshold of TSH should be lower than ULR or greater than LLR for the reflex detection of FT4 and/or FT3. The Youden index at the optimal threshold obtained via ROC curve analysis was greater than the Youden index at the reference interval limit, indicating that the ROC curve method is better at obtaining the optimal thresholds. Moreover, many patients with normal TSH levels had abnormal FT4 and/or FT3 levels, as shown in Table I. This may suggest that the strategy of reflexively ordering FT4 and/or FT3 using TSH may be inappropriate.3-5 In addition, because FT3 or FT4 show only a weak negative correlation with TSH (Figure 1), and based on our many years of experience with reflex testing, we speculate that reflex testing with FT3 or FT4 using TSH may have limited feasibility.

In conclusion, we present an interesting clinical practice and medical cost problem. This reflex ordering can be easily set up in the LIS system via two-way communication to achieve automatic ordering.

COMPETING INTEREST:
The authors declared no conflict of interest.

AUTHORS’ CONTRIBUTION:
XZ, GMZ: Writing, review, and editing of the original draft.
Both authors approved the final version of the manuscript to be published.

REFERENCES

  1. Murthy S, Scott J, Lu S, Zhang D, Vanstone JR, Berry WE, et al. Reducing unnecessary free thyroid hormone testing by the reinforcement of a reflexive algorithm in an outpatient environment. Clin Biochem 2025; 137:110919. doi: 10. 1016/j.clinbiochem.2025.110919.
  2. Zhang GM, Guo XX, Higgins T, Xu Q, Cembrowski G. Limiting the testing of aspartate aminotransferase: Using the proper upper limit of the reference interval of alanine aminotransferase. Am J Clin Pathol 2016; 145(4):575-6. doi: 10.1093/ajcp/aqw069.
  3. Srivastava R, Bartlett WA, Kennedy IM, Hiney A, Fletcher C, Murphy MJ. Reflex and reflective testing: Efficiency and effectiveness of adding on laboratory tests. Ann Clin Biochem 2010; 47(Pt 3):223-7. doi: 10.1258/acb.2010.009282.
  4. Biondi B, Cappola AR, Cooper DS. Subclinical hypothyroidism: A review. JAMA 2019; 322(2):153-60. doi: 10.1001/jama.2019.9052.
  5. Kosmulski M. Are you in top 1% (1 per thousand)? Scientometrics 2018; 114(2):557-65. doi: 10.1007/s11192-017- 2526-4.